Scalable Probabilistic Databases with Factor Graphs and MCMC

نویسندگان

  • Michael L. Wick
  • Andrew McCallum
  • Gerome Miklau
چکیده

Incorporating probabilities into the semantics of incomplete databases has posed many challenges, forcing systems to sacrifice modeling power, scalability, or treatment of relational algebra operators. We propose an alternative approach where the underlying relational database always represents a single world, and an external factor graph encodes a distribution over possible worlds; Markov chain Monte Carlo (MCMC) inference is then used to recover this uncertainty to a desired level of fidelity. Our approach allows the efficient evaluation of arbitrary queries over probabilistic databases with arbitrary dependencies expressed by graphical models with structure that changes during inference. MCMC sampling provides efficiency by hypothesizing modifications to possible worlds rather than generating entire worlds from scratch. Queries are then run over the portions of the world that change, avoiding the onerous cost of running full queries over each sampled world. A significant innovation of this work is the connection between MCMC sampling and materialized view maintenance techniques: we find empirically that using view maintenance techniques is several orders of magnitude faster than naively querying each sampled world. We also demonstrate our system’s ability to answer relational queries with aggregation, and demonstrate additional scalability through the use of parallelization on a real-world complex model of information extraction. This framework is sufficiently expressive to support probabilistic inference not only for answering queries, but also for inferring missing database content from raw evidence.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Synthesis: Representing Uncertainty in Databases with Scalable Factor Graphs

Probabilistic databases play a crucial role in the management and understanding of uncertain data. However, incorporating probabilities into the semantics of incomplete databases has posed many challenges, forcing systems to sacrifice modeling power or restrict the class of relational algebra formulae under which they are closed. In this paper we propose a probabilistic representation that uses...

متن کامل

Synthesizing Open Worlds with Constraints using ocally Annealed Reversible Jump MCMC

We present a novel Markov chain Monte Carlo (MCMC) algorithm that generates samples from transdimensional distributions encoding complex constraints. We use factor graphs, a type of graphical model, to encode constraints as factors. Our proposed MCMC method, called locally annealed reversible jump MCMC, exploits knowledge of how dimension changes affect the structure of the factor graph. We emp...

متن کامل

Information theory tools to rank MCMC algorithms on probabilistic graphical models

We propose efficient MCMC tree samplers for random fields and factor graphs. Our tree sampling approach combines elements of Monte Carlo simulation as well as exact belief propagation. It requires that the graph be partitioned into trees first. The partition can be generated by hand or automatically using a greedy graph algorithm. The tree partitions allow us to perform exact inference on each ...

متن کامل

Scalable Feature Selection for Large Sized Databases

Feature selection determines relevant features in the data. It is often applied in pattern classiica-tion. A special constraint for feature selection nowadays is that the size of a database is normally very large. An eeective method is needed to accommodate the practical demands. A scalable probabilistic algorithm is presented here as an alternative to the exhaustive and heuristics approaches. ...

متن کامل

Calibration of Load and Resistance Factors for Reinforced Concrete

Current approach for designing of reinforced concrete members is based on the load and resistance factor. However the load and resistance parameters are random variables, the constant values have been designated for them in the designing procedure. Assuming these factors as the constants, will be led to the unsafe and uneconomical designs. Safe designing of structures requires appropriate recog...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • PVLDB

دوره 3  شماره 

صفحات  -

تاریخ انتشار 2010